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Abstract 

Background: Next-generation sequencing technology is an important tool for the rapid, genome-wide 
identification of genetic variations. However, it is difficult to resolve the 'signal' of variations of interest and the 
'noise' of stochastic sequencing and bioinformatic errors in the large datasets that are generated. We report a 
simple approach to identify regional linkage to a trait that requires only two pools of DNA to be sequenced from 
progeny of a defined genetic cross (i.e. bulk segregant analysis) at low coverage (<10x) and without parentage 
assignment of individual SNPs. The analysis relies on regional averaging of pooled SNP frequencies to rapidly scan 
polymorphisms across the genome for differential regional homozygosity, which is then displayed graphically. 

Results: Progeny from defined genetic crosses of Tribolium castaneum (F 4 and F 19 ) segregating for the phosphine 
resistance trait were exposed to phosphine to select for the resistance trait while the remainders were left 
unexposed. Next generation sequencing was then carried out on the genomic DNA from each pool of selected 
and unselected insects from each generation. The reads were mapped against the annotated T. castaneum genome 
from NCBI (v3.0) and analysed for SNP variations. Since it is difficult to accurately call individual SNP frequencies 
when the depth of sequence coverage is low, variant frequencies were averaged across larger regions. Results 
from regional SNP frequency averaging identified two loci, tc_rphl on chromosome 8 and X.c_rph2 on chromosome 
9, which together are responsible for high level resistance. Identification of the two loci was possible with only 5-7x 
average coverage of the genome per dataset. These loci were subsequently confirmed by direct SNP marker 
analysis and fine-scale mapping. Individually, homozygosity of tc_rphl or tc_rph2 results in only weak resistance to 
phosphine (estimated at up to 1.5-2.5X and 3-5x respectively), whereas in combination they interact synergistically 
to provide a high-level resistance >200x. The Xc_rph2 resistance allele resulted in a significant fitness cost relative to 
the wild type allele in unselected beetles over eighteen generations. 

Conclusion: We have validated the technique of linkage mapping by low-coverage sequencing of progeny from a 
simple genetic cross. The approach relied on regional averaging of SNP frequencies and was used to successfully 
identify candidate gene loci for phosphine resistance in 7". castaneum. This is a relatively simple and rapid approach 
to identifying genomic regions associated with traits in defined genetic crosses that does not require any 
specialised statistical analysis. 
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Background 

High-throughput "next-generation" sequencing is now a 
cost effective tool for most genetics laboratories, making 
new approaches for genetic analysis a possibility. One of 
the promising aspects of next-generation sequencing 
data is that it can be used to rapidly define genomic 
regions of interest in linkage and quantitative genetics 
experiments. There are myriad ways that this can be 
accomplished but no straightforward statistical and 
data processing approach has been developed to distin- 
guish the 'signal' from the 'noise'. Some researchers 
have simplified the analysis by establishing specially 
prepared genetic strains, as in linkage by introgression 
[1]. In other systems sequencing of the parental genomes 
has been used to allow parentage to be assigned to single 
nucleotide polymorphisms (SNPs), which is an alterna- 
tive strategy to facilitate identification of linkage to a trait 
or traits. It is challenging, however, to confidently deter- 
mine the parentage assignment of SNPs in genome-wide 
datasets that have a relatively low average coverage 
(<10x), such as might be encountered in organisms with 
large genomes. 

In this study we developed a relatively simple and ro- 
bust method that does not require the parental genomes 
to be known or sequenced. The method involves bulk- 
segregant sequence analysis and averaging frequencies of 
SNPs attaining differential homozygosity to display gen- 
omic regions containing genes for a specific trait. We 
used this method to identify genes conferring resistance 
to an insecticidal fumigant, phosphine (PH 3 ) in a cosmo- 
politan grain insect pest, Tribolium castaneum. Phos- 
phine is an effective fumigant, used worldwide to protect 
grain harvest from insect pests [2]. It has been in use for 
nearly five decades owing to its broad spectrum of acti- 
vity and residue-free characteristics [3,4]. Heavy reliance 
on phosphine has given rise to high-level heritable resis- 
tance in several grain insect pests, threatening the effi- 
cacy of phosphine [5-10]. 

Classical genetic studies on resistance to phosphine in 
two major grain insect pests, the rust red flour beetle, 
T. castaneum [11] and the lesser grain borer, Rhyzopertha 
dominica [12] has confirmed that strong resistance is 
primarily conferred by two major genes, rphl and rph2. 
Recendy, a gene encoding a core metabolic enzyme, 
dihydrolipoamide dehydrogenase (did) has been identi- 
fied as rph2 [13] in both R. dominica and T. castaneum 
and a free soil living nematode Caenorhabditis elegans, 
indicating that resistance mechanisms involve altered 
oxidative metabolism with key respiratory enzymes. T. 
castaneum is a model organism for both genetic and 
developmental studies and the wealth of background 
genetic information with a recently published draft 
genome [14], makes this insect an ideal system in which 
to test novel applications of next-generation sequencing 



to gene mapping. The present study aimed to (i) apply 
next generation sequencing methods to linkage map- 
ping analyses by identifying the genomic location of 
genes conferring resistance to phosphine and (ii) de- 
velop SNP markers for understanding the biological and 
genetic aspects (gene interactions and fitness) of this 
trait on grain insect pests. 

Results 

Sequence analyses and SNP discovery 

Two similar single pair inter-strain crosses (SIC), SICs/sra 
and SIC S / SRB (both S $ X Strong-R S) were established 
(Figure 1) and selected at different generational cohorts 
(F 4 and Fi 9 for the different crosses) for strong resistance. 
Selected (strongly resistant) and unselected progeny of 
these crosses were sequenced using the Illumina 
GAIIx platform and reads were mapped back to the T. 
castaneum genome assembly (NCBI version 3.0). The 
outline of the crossing schemes, selections, read map- 
ping and SNP calling criteria are described in more 
detail in Materials and Methods section. The Illumina 
sequencing results for the generational selected and 
unselected pools are outlined in Table 1. 

Identification of candidate regions 

In order to visualise regions linked to resistance, i.e. se- 
lected for homozygosity in phosphine exposed animals, 
we used a simple spreadsheet analysis of SNP output 
data from CLC Genomics Workbench. CLC Genomics 
Workbench outputs each SNP position, the primary 
variant for each position and the frequency of the pri- 
mary variant, i.e. the ratio of the number of reads of the 
variant/total number of matching reads expressed as a 
percentage. For each generation's (F 4 and F 19 ) datasets, 
we combined the two SNP output tables (selected and 
unselected) for each chromosome into the same spread- 
sheet and put the primary variant frequencies in separate 
columns for each dataset. The data tables were then 
sorted by SNP position, so that the two datasets were 
aligned in order along the chromosome/scaffold and 
then the primary variant frequencies were averaged over 
300 cells in the spreadsheet for the selected and the un- 
selected columns (Additional file 1: Table SI). The win- 
dow size does not need to be fixed at 300, but 300 cells 
gave us the best resolution on these datasets, with the 
least amount of artefacts caused by low coverage. The 
averages were then plotted graphically (Figure 2). For 
the smaller, unassigned scaffolds, we averaged 50 cells 
or less, depending on how many SNPs were called on 
each scaffold. 

In this way, each plotted data point is an average of a 
virtual 'dynamic window) which changes in size on the 
corresponding chromosome sequence depending on the 
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Figure 1 Outline of the crossing scheme and sequencing strategy. 



number of SNPs called. However the window at any 
point always covers the same region for both selected 
and unselected sets and moves down by one SNP each 
time. The actual physical distance which the dynamic 
window covers is determined by the span encompassed 
by the 300 contiguous SNPs in the window. This aver- 
aging causes artefacts due to very low SNP density, such 
as gaps in the scaffold sequence, or high SNP density 
such as SNP clusters) to be smoothed out by the ana- 
lysis. The ideal window size cannot be calculated a 
priori, and is chosen empirically to reveal candidate re- 
gions which should be validated by detailed genetic ana- 
lysis. The window size does not necessarily change with 
number of SNPs called, genome size or variant fre- 
quency. However, the length of the contig in fragmented 
genomes will affect the total number of SNPs called for 
a given contig, so smaller window sizes are preferable 
for shorter sequences. One consideration though, is that 
smaller window sizes are affected by the depth of cove- 
rage and can show up artefact differences between 
datasets that are caused by relatively low coverage. The 
depth of coverage can have large effects on individual 
SNP frequencies and sometimes regions can have very 
low coverage which can produce an average of 'zero' or 

Table 1 Results summary of sequencing and read mapping 



be appear to be homozygous in smaller windows. Larger 
window sizes tend to 'smooth out' the analysis and ac- 
count for SNPs that are missed. In general, there is an 
inverse relationship with the depth of sequencing cove- 
rage and the effective window size that will give reaso- 
nable resolution, i.e. the higher the depth of coverage, 
the better the estimation of SNP frequency will be for 
any particular SNP, and so the smaller the allowable win- 
dow size. 

The simple averaging method for genome wide ana- 
lysis of SNP allele frequencies shows major regions 
linked to phosphine resistance displayed as differential 
SNP frequency caused by local regions of increased 
homozygosity in the selected dataset. The method does 
this without the aid of parentage assignment or other 
complex statistical calculation of association to the trait. 
The validity of the method is demonstrated by the fact 
that the regions found linked to resistance were con- 
firmed in independent crosses from the same parental 
strains. 

In the F 4 data, four regions were found on Chr8, 
(region I; 274 to 896 Kbp, region II: 2.7 to 3.8 Mbp, region 
III: 5.83 to 6.01 Mbp and region IV: 7.96 to 8.54 Mbp), 
two regions in Chr9 (region I: 2.2 to 4.16 Mbp, region II: 



Cross 


Cohort/pool 


Read length(bp) 


No. reads 


Matched reads 


Total bases matched 


Approx. coverage 


No. SNPs called 




F 4 Unselected 


36 (paired) 


49,636,554 


28,1 1 9,799 


1,012,312,764 


5x 


556,683 




F 4 Selected 


36 (paired) 


52,622,706 


28,155,197 


1,013,587,092 


5x 


297,483 


SICs/sra 


F q9 Unselected 


75 (unpaired) 


7,944,144 


6,443,622 


483,271,650 


2.4x 


310,852 


SICs/SRA 


F 19 Selected 


75 (paired) 


1 5,643,072 


1 2,246,096 


918,457,200 


4.5x 


602,068 
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Figure 2 Average SNP frequency across Chr8 Chr9 and Uns-7, from A) F 4 and B) F 19 datasets (Selected vs Unselected). The window size 
used was an average of 300 cells (or 50 cells for Uns-7) in the spreadsheet analysis. The x-axis is a non-linear scale determined by the number of 
SNPs in the dataset. The regions showing linkage to resistance are highlighted by a dashed arrow; solid arrow indicates regions where the 
resistance loci were confirmed by fine-scale linkage mapping. 



5.07 to 7.2 Mbp) and one region in unassigned genomic 
scaffold 7 (DS497671), hereafter referred to as Uns-7, (86 
to 200 Kbp), that constituted candidate regions associated 
with phosphine resistance in the T. castaneum genome 
(Figure 2). Analysis of the F 19 data revealed that on Chr8, 
only one of the four previously identified regions was 
nearly homozygous and therefore confirmed to be linked 
to resistance (Figure 2). In the case of Chr9, neither candi- 
date region previously identified in the F 4 data was con- 
firmed in the F i9 analysis, i.e. no increase in the frequency 
of homozygosity of SNPs was observed in either genomic 
region in animals that survived exposure to phosphine 
(Figure 2). However, a region of the previously identified 
contig Uns-7 was retained as a candidate resistance locus 
(Figure 2). As a result, subsequent analysis was focused on 
region III in Chr8 and the candidate region in Uns-7. No 
other region of the genome showed an appreciable skew 
toward homozygosity of SNPs in selected versus unse- 
lected datasets of the F i9 generation, indicating that no 
other resistance loci were present other than the locus on 
Chr8 and the locus on Uns-7 (Additional file 2: Figure SI). 

Fine scale mapping 

A subset of identified SNPs from Chr8 and Uns-7 that were 
homozygous in selected and heterozygous in unselected Fi 9 
datasets was chosen for further analysis (Additional file 3: 
Table S2). In addition, some Chr9 SNPs were included 
to determine whether the Uns-7 contig mapped to the 
candidate region of Chr9 that had been identified previ- 
ously in the F 4 generation. These SNPs were selected 
based on the availability of restriction enzymes that could 
digest the sequence of one allele, but not the other. The 
local regions around these SNPs were amplified and 
tested for polymorphism using the appropriate restriction 
enzymes (i.e. CAPS analysis). In one case there was an 



informative size polymorphism found after amplification 
and so was included in the analysis. These markers were 
analysed in the Po parents of the SICs/sra cross, the Fi 
pair used to establish the genetically segregating line, 
96 individuals from the F 19 generation that survived 
exposure to 0.8 mg 1/ phosphine and 160 individuals 
from the F 2 8 generation that survived exposure to 
0.5 mg L' 1 phosphine. The naming convention used 
in this study describes tc as an acronym for T. 
castaneum, followed by the type of marker used (p) 
for a PCR-based size polymorphism or c for CAPS 
followed by their approximate genomic location in 
kilo (k) or mega (m) bases. 

For Chr8, we also tested the linkage to resistance of 
three of the four identified regions, in 48 selected 
strongly resistant F 4 individuals of the cross SICs/srb 
that had survived at a high dose of 0.8 mg U . Linkage 
analysis of markers tcp8-777 k (Region I), tcc8-5.975 m 
(region III) and tcp8-8.11 m (region IV) on selected 
strongly resistant F 4 individuals showed that they 
exhibited >75% linkage to resistance. The marker tcp8- 
777 k had 9 recombinants, while the marker tcp8-8.11 m 
had 7 recombinants, corresponding to estimated genetic 
distances of 3.1 cM and 2.5 cM from the resistance locus 
respectively. However, only one locus, tcc8-5.975 m, was 
absolutely linked to resistance in the F 4 SIC S /srb cross and 
was the only one of the four regions that was identified 
in the Fi 9 sequencing and confirmed further in fine 
scale mapping. This led us to conclude that there was 
only one locus on Chr8, in region III, and that we had 
not identified multiple resistance loci on Chr8 that were 
subsequently lost from the population between F 4 and 
F 19 , either due to drift or pleiotropic effects on fitness. 
It also shows that the low coverage of sequencing iden- 
tifies linked regions across several cM in the F 4 . 
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22 recombinants 
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Figure 3 Fine-scale mapping of Chr8 defining the candidate region tc_rph1. The numbers of recombinants for each marker across the 
region are highlighted in pink, with the defined region containing resistance locus tc_rph1 in red. 



For mapping across the tightly linked Chr8 candidate 
region identified in the F19, SNP markers were designed 
across the entire candidate region, (Additional file 3: 
Table S2) and tested in 160 strongly resistant F 2 s indi- 
viduals. Individuals that were non-recombinant for these 
SNP markers across the tc_rphl candidate region were 
excluded from the analysis. The results of fine scale 
mapping defined a candidate region for Xc_rphl from 
tcc8-5.975 m to tcc8-6.04 m (Figure 3). Similarly, the 
marker tccU7-138.2 k with no recombinants in 83 
strongly resistant selected F19 individuals confirmed the 
close linkage to second resistance locus, tc_rph2, on 
Uns-7 (Figure 4). 

Positioning the unassigned scaffold (Uns) - 7 into 
chromosome 9 

The loss of homozygosity seen on Chr9 in the F i9 
dataset, compared to the observed homozygous peaks in 
F 4 datasets, suggested that an error in genome sequence 
assembly had resulted in the omission of a segment of 
DNA sequence from Chr9 and this indicated that a re- 
sistance locus did indeed map to Chr9. In such a situ- 
ation, the finer scale of the F19 genetic map would result 
in the loss of apparent linkage to Chr9. The fact that 
Uns-7 did contain a resistance locus and was not 
assigned to any particular site in the genome indicated 
that it could actually map to the region of Chr9 origi- 
nally identified as a resistance locus in Chr9. To test this 



hypothesis, we mapped the relative genetic positions of 
three SNP markers: tcc9-3.05 m, tcp9-3.70 m (both from 
region I of Chr9) and tccU7-138.2 k, the marker for 
tc_rph2 locus (from Uns-7) in 92 individuals of an F 2 
population of the SICs/sra cross (Figure 1). We found 
that marker tccU7-138.2 k is flanked by tcc9-3.05 m and 
tcp9-3.70 m, with map distances of 1.1 cM (LOD = 34.9) 
and 1.6 cM (LOD = 35.9), respectively, confirming our 
hypothesis that Uns-7 is a part of Chr9 (Figure 4), and 
most likely positioned in a gap of unknown size between 
two scaffolds (NW_001093536.1 and NW_001093382.1), 
designated 'new_gap6782' (Figure 2). 



Candidate genes for phosphine resistance 

The technique identified a target locus on the Uns-7 
scaffold, even though it was not assembled into the lar- 
ger genome and was itself incomplete and had many 
gaps in the sequence. This scaffold contains 19 predicted 
genes, of which two (TC006822 and TC006823) are ac- 
tually the first and last exons of the dihydrolipoamide 
dehydrogenase (did) gene (Table 2 and Figure 4). We 
used our paired end sequencing data to de novo assem- 
ble the region containing DLD, so that all the exons were 
represented in the genomic sequence. This assembled contig 
has been deposited into Genbank (KF032715). Comparative 
genetic analysis has confirmed this gene as the phosphine 
resistance locus rph2 in T. castaneum, R. dominica and the 



Chr9 (CM000284) 



tcc9-3.05m tccU7-1 38.2k tcp9-3.70m 

I 1 .1cM (LOD=34.9) I 1 ,6CM (LOD=35.9) 



(l_OD=34.9) i 

_. ..j 1— - 



Uns-7 (DS497671) 



DLD 



Figure 4 Linkage mapping of unassigned scaffold (Uns) - 7 postioning it on chromosome 9. The region highlighted in red on Uns-7 
contains the known DLD resistance locus (tc_rph2). 
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Table 2 List of candidate genes for phosphine resistance in Chr8 and Chr9 (Uns-7) 



Gene ID Protein accession Homology in Drosophila E-value Predicted biological function/identity 



Chr8 










TC005925 


EFA09201 


CG5255 


7.00E-28 


Trypsin like serine protease H65 


TC005926 


EFA08291 


NA 


NA 


Unknown function 


TC005927 


EFA08292 


CG 13804 


3.00E-150 


Yellow-g2 


TC005928 


EFA08293 


CG6543 


3.00E-158 


Cyclohex-1-ene-1-carboxyl-CoA hydratase; enoyl CoA hydratase 


TC005929 


EFA08294 


CG2663 


3.00E-36 


Sec14p-like lipid-binding domain; Alpha tocopherol transfer protein 


TC005930 


EFA08295 


CG 13096 


1 .00E-28 


Ribosomal protein L1 


TC005931 


EFA08296 


NA 


NA 


Unknown function 


TC005932 


EFA08297 


CG5670 


0 


Na(+)/K(+) ATPase alpha subunit 


TC006224 


EFA08569 


CG 12880 


5.00E-29 


Unknown function 


TC006225 


EFA08570 


CG7463 


1 .00E-06 


Yellow-k 


TC006226 


EFA08571 


CG5717 


3.00E-160 


Yellow-g 


TC006227 


EFA08572 


CG9792 


1.00E-116 


Yellow-e 


TC006228 


EFA08573 


NA 




Unknown function 


TC006229 


EFA08574 


CG9891 


9.00E-89 


Yellow-e3 


TC006230 


EFA08575 


CG1629 


2.00E-132 


Yellow-h 


TC006231 


EFA08576 


CG13279 


9.00E-104 


Cytochrome b5 related; Fatty acid desaturase 


TC006232 


EFA08577 


CG13279 


1.00E-117 


Cytochrome b5 related; Fatty acid desaturase 


Chr9 (Uns-7) 










TC006821 


EFA13109 


CG17514 


0 


Translational activator gcn-1, partial; Translator activator activity 


TC006822 


EFA131 10 


CG7430 


1 .00E-48 


Dihydrolipoamide dehydrogenase E3 subunit 


TC006823 


EFA13111 


CG7430 


1 .00E-82 


Dihydrolipoamide dehydrogenase E3 subunit 


TC006824 


EFA13130 


CG42698 


7.00E-1 1 1 


Pou domain motif 3 


TC006825 


EFA13112 


CG6551 


3.00E-104 


Serine/threonine protein kinase 


TC006826 


EFA13113 


CG6551 


8.00E-38 


Serine/threonine protein kinase 


TC006827 


EFA13114 


CG6551 


6.00E-38 


Serine/threonine protein kinase 


TC006828 


EFA13115 


CG6178 


5.00E-87 


Fatty acetyl coA synthese activity 


TC006829 


NA 


CG 1 1 7 1 5 


437E-08 


Cytochrome P450; Cyp4g15 


TC006830 


EFA13116 


CG3800 


1 .00E-03 


Polyprotein; Nucleic acid binding protein 


TC006831 


EFA13117 


CG 14660 


3.90E-02 


Putative serine protease; Labial associated factor 


TC006832 


EFA131 18 


CG3104 


1.00E-18 


Dopamine transporter; Ankyrin repeats involving in protein-protein interactions 


TC006833 


EFA131 19 


CG6178 


S.00E-102 


AMP binding enzyme; Fatty aceyl coA synthese activity 


TC006834 


EFA13120 


CG6551 


6.00E-17 


Serine/threonine protein kinase; ATP Binding 


TC006835 


EFA13121 


CG3104 


2.00E-18 


Dopamine transporter; Ankyrin repeats involving in protein-protein interactions 


TC006836 


EFA13122 


NA 


NA 


Unknown function 


TC006837 


EFA13123 


CG42231 


3.00E-10 


Unknown function; Uncharacterized conserved protein 


TC006838 


EFA13124 


CG5245 


625E-49 


Nucleic acid/Zinc ion binding 


TC006840 


EFA13125 


CG 14938 


3.00E-56 


Nucleic acid/Zinc ion binding 


nematode, Caenorhabditis t 


zlegans [13]. This was 


taken as 


these genes have strong homologies to genes of known 


a validation of the SNP averaging technique. 




function, the functions of most of them are either 


The candidate region 


defined by markers tcc8-5 


ill-defined or unknown. Therefore further work is 


.95 m and 


tcc8-6.04 m 


on Chr8 contains 


17 gene 


required to identify the resistance gene within this 


predictions (Table 2 and Figure 3). While 


some of 


region. 
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Epistatic interactions between the two resistance loci 

The relative contribution to the resistance phenotype of 
the two loci tc_rphl and tc_rph2 was estimated by geno- 
typing surviving F 4 individuals of the cross SICs/sra 
(Figure 1) exposed to a range of phosphine concentra- 
tions. Robust quantification of the relative strengths is 
difficult, but an estimate of the relative strengths of each 
genotype can be gleaned from Table 3 using the range of 
values that the LC 99 . 9 is likely to fall between. The rela- 
tive strength of the phenotype for each genotypic com- 
bination was SS (tc_rphl ss IX.c_rphT \ 0.02-0.03 mg I/ 1 ) 
< SR {tc_rphritc_rph? s , 0.03-0.05 mg I/ 1 ) < RS (tc_rphl ss / 
tc_rphT, 0.06-0.1 mg L _1 )<RR {tc_rphl rr /tc_rph2 n ] 
>4.0 mg L' 1 ). The LC99.9 value for the fully sensitive 
(QTC4) parental strain as calculated previously is 
0.02 mg I/ 1 [11]. Using this as a reference value we can 
estimate the individual contribution of Xcjrphl and 
tc_rph2, as approximately 1.5-2.5x and 3.0-5.0x, respec- 
tively. When homozygous for both resistance loci, the 
relative resistance ratio is >200x, demonstrating a strong 
synergistic interaction between the two loci. 

Relative fitness of tc_rph1 and tc_rph2 

We estimated the change in frequency of resistance 
alleles in the absence of phosphine selection over 18 gene- 
rations, from F 2 to F 20 , using the markers tcc8-5.975 m 
(tc_rphl) and tccU7-138.2 k {tc_rph2) and analysed devi- 
ation from Hardy- Weinberg equilibrium using chi-square 
test (Table 4). The results showed a steady increase in the 
homozygous-resistant genotype tc_rphl rr in each gene- 
ration from F 5 to F 20 & = 18.8; P < 0.001, df = 1), indicat- 
ing that tc_rphl may have a selective fitness advantage 
associated with phosphine resistance. In contrast, tc_rph2 
showed a significant decrease in resistant homozygotes 



{tc_rph2") coupled with a steady increase in sensitive ho- 
mozygotes (tc_rph2 ss ) over 18 generations (x 2 = 75.2, P < 
0.001, df = 1) indicating a strong fitness cost associated 
with this resistance allele in the absence of selection. 

Discussion 

Finding single nucleotide changes that cause large 
phenotype changes in highly polymorphic genetic back- 
grounds is a formidable task. In the present study we 
have taken advantage of next-generation sequencing 
technology and the published reference genome se- 
quences of T. castaneum to rapidly identify the genomic 
locations of phosphine resistance loci in T. castaneum. 
Our results demonstrate the power of our simple aver- 
aging method to scan for local regions of loss of hetero- 
zygosity, which is preferable to standard linkage 
mapping for identifying gene regions responsible for 
well-defined traits in advanced intercrosses. The method 
does not rely on a high quality reference genome, high 
depth of sequence coverage, use of inbred strains or 
parentage assignment of SNPs. The whole genome in- 
cluding multiple unassigned scaffolds can be easily 
scanned to identify linked regions. 

The simple averaging method also has advantages in 
that it can reduce artefacts caused by gaps in reference 
genome assemblies. Large gaps that are included in the 
assembled sequence, as is the case for T. castaneum, can 
cause major problems for analyses that rely on defined 
physical distances, as there would appear to be no vari- 
ants or genes within these gaps. Variable SNP density 
can also cause problems for analyses that rely on defined 
physical distances. Our technique eliminates this prob- 
lem by allowing the SNPs in the datasets to define the 
region to be averaged and how far the analysis window 



Table 3 Relative contribution of resistance loci, tc rphl and tc rph2 

Genotypes Dose of phosphine (mg L" 1 ) 



tc_rph 1 tc_rph2 


US* 


0.008 


0.01 


0.02 


0.03 


0.05 


0.06 


0.1 


0.2 


0.5 


0.8 


1 


2 


3 


4 


ss ss 


6 


1 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


SI ss 


16 


1 1 


6 


4 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


11 ss 


15 


8 


6 


15 


9 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


ss SI 


8 


A 


2 


2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


SI SI 


25 


21 


29 


5 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


II SI 


18 


16 


19 


31 


19 


5 


12 


0 


0 


0 


0 


0 


0 


0 


0 


ss II 


3 


0 


2 


1 


1 


/ 


2 


0 


0 


0 


0 


0 


0 


0 


0 


SI II 


2 


3 


3 


2 


10 


9 


6 


8 


1 


3 


0 


0 


0 


0 


0 


11 11 


0 


1 


4 


/ 


4 


11 


3 


5 


7 


/ 


4 


3 


3 


3 


1 


No. analysed 


93 


65 


71 


68 


44 


32 


24 


13 


8 


10 


4 


3 


4 


3 


1 


PCR amplified 


96 


90 


/I 


68 


44 


32 


28 


14 


10 


10 


4 


A 


A 


3 


2 


Survivors 


202 


107 


155 


90 


75 


40 


30 


14 


11 


11 


4 


6 


A 


3 


3 


Total no.of insects bioassayed 


202 


130 


201 


201 


200 


313 


204 


295 


312 


300 


320 


302 


310 


300 


320 



*US = Unselected beetles. 
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Table 4 Fitness analysis of resistance loci, tc rphl and tc rph2 over 18 generations in the absence of phosphine 
selection 



Generations 


Insects 
tested 




Genotypes 




Allelic frequency 


Chi-square analysis on genotypes' 


rr 


rs 


ss 


p allele 


q allele 


/(df=1) 


P value 


tcjphl 


















F 2 


94 


28 


41 


27 


0.51 


0.49 


2.15 


0.143 


F 5 


94 


34 


43 


17 


0.59 


0.41 


6.83* 


0.009 


F,o 


96 


38 


45 


13 


0.63 


0.37 


1 3.4** 


<0.001 


F,5 


95 


35 


46 


14 


0.61 


0.39 


9.38* 


0.002 


F20 


96 


38 


48 


8 


0.65 


0.34 


1 8.8** 


<0.001 


tc_rph2 


















F 2 


94 


20 


52 


22 


0.49 


0.51 


1.15 


0.283 


F 5 


96 


5 


54 


37 


0.33 


0.67 


22.8** 


<0.001 


F,o 


92 


/ 


38 


45 


0.29 


0.71 


33.6** 


<0.001 


F,5 


96 


0 


28 


64 


0.15 


0.85 


99.0** 


<0.001 


F20 


96 


0 


36 


58 


0.19 


0.81 


75.2** 


<0.001 



* Significant (P<0.01); ** Significant (P< 0.001). 

f Significance values are calculated from deviation from Hardy-Weinberg equilibrium compared to expected F 2 (1:2:1) ratio. 



is shifted along the chromosome for each iteration, ra- 
ther than a user-defined physical distance. This is easily 
visualised with spreadsheet software, such as Excel*". 

Our analysis defined two genomic regions on Chr8 
and Chr9 (Uns-7) containing loci responsible for high- 
level resistance, tc_rphl and tc_rph2. The results of F 4 
SNP analysis showed discontinuous SNP frequency sig- 
nals in the candidate regions of both Chr8 and Chr9, 
represented as separate peaks in the F 4 SNP frequency 
graphs (Figure 2) and suggested the possible disorienta- 
tion or erroneous order of chromosomal scaffolds in the 
reference genome, either due to lack of genetic markers 
or low recombination in these regions in the crosses 
used to create the linkage map [15]. Our inheritance 
analysis of selected SNP markers from Chr8, Chr9 and 
Uns-7 on a F 2 mapping population confirmed this and 
identified the location of Uns-7 on Chr9 (new_gap6782). 
Similar chromosomal sequence discrepancies on the T. 
castaneum reference genome especially on Chr9 were 
observed by Wang et al. [16] and the respective se- 
quences were later rearranged to their proper location. 

Candidate genes 

Our next-generation sequencing and mapping results 
show that the major candidate resistance gene for 
tc_rph2 is on Uns-7. This scaffold contains the 
dihydrolipoamide dehydrogenase (did) gene, which was 
confirmed as a resistance locus using more traditional 
methods that required much greater effort and expense 
[13]. This demonstrates the validity, power and effi- 
ciency of the simple averaging method. All sequence 
scaffolds, including those not assigned to chromosomes 



can be scanned independently and rapidly for linkage to 
a trait. Since more than 22 Mb (10%) of the T. castaneum 
genome sequence is still unassigned to genomic loca- 
tions, this represents a powerful way of detecting linkage, 
even across smaller regions in fragmented genome 
assemblies. 

At the tc_rphl locus, the candidate gene list contains 
17 predicted genes. However, none of them are associ- 
ated with the same pathways as did, consistent with the 
hypothesis of strongly resistant insects having at least 
two different mechanisms of resistance to phosphine 
[2,17-20]. Although it is difficult to discuss the potential 
roles of each of the candidates, genes in the mapped 
interval are involved in lipid metabolism, chitin metabo- 
lism, membrane permeability and maintaining ion gradi- 
ents. A resistance gene in any of these pathways would 
be consistent with previous observations on phosphine 
mode of action through oxidative stress and lipid peroxi- 
dation and resistance through reduced uptake [13,19-24]. 

Epistatic interactions between the two resistance loci 

We used markers tightly linked to resistance loci tc_rphl 
and tc_rph2 to determine their relative phenotypic ef- 
fects and genetic interactions in their homozygous and 
heterozygous states. In the genotype analysis, we showed 
that the resistance locus tc_rph2 exhibited a stronger 
phenotypic effect compared to tc_rphl. Both resistance 
loci are incompletely recessive but when both in a 
homozygous state they interact synergistically with re- 
sistance levels greater than 200 x. This is a conservative 
estimate based on survival of the rphl" Irph2 rr genotype 
at doses up to 4.0 mg L" 1 , the highest dose tested. These 
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results are in accordance with previous results showing 
synergistic interactions between resistance genes with 
resistance levels up to 431 x, estimated using LC50 
sponse values [11]. The synergism observed between 
the resistance alleles of T. castaneum resembles the 
synergistic interaction of rphl and rph2 in R. dominica 
[6]. These results are also consistent with the two gene 
model of high-level phosphine resistance seen in R. 
dominica [6,12,25]. 

Pleiotropic effects on fitness 

Previous analysis [11] on the relative change in the re- 
sistance phenotype in the absence of phosphine selection 
showed the existence of a possible fitness disadvantage 
in a T. castaneum population segregating for strong re- 
sistance alleles. In the present study, we observed a sig- 
nificant decrease in the frequency of the tc_rph2 rr 
homozygous resistant genotype over 18 generations with 
a corresponding increase in heterozygote (tc_rph2 rs ) and 
susceptible genotypes {tc_rph2 ss ). This directly shows 
the presence of significant selective fitness disadvantage 
for homozygotes at the tc_rph2 locus. Field population 
based fitness studies [26-29] on phosphine resistant in- 
sect strains of R. dominca, T. castaneum, Sitophilus 
oryzae, Cryptolestes ferrugineus and Oryzaephilus 
surinamensis reported the existence of biological fitness 
cost associated with phosphine resistance. 

In contrast to tc_rph2, we observed a significant in- 
crease in the frequency of the homozygous resistant allele 
of tc_rphl, suggesting a selective fitness advantage of 
weakly resistant homozygotes (tc_rphl rr ) compared to 
susceptible genotypes. These results are consistent with 
previous fitness estimates on weakly resistant strains of 
T. castaneum, R. dominica and 5. oryzae [30] and shows 
that there are fundamental differences between the 
genes that are associated with each resistance locus. 

Conclusion 

The results of the present study used a whole genome 
resequencing approach and we validated our method of 
processing the sequences for rapid scanning of polymor- 
phisms and identifying the candidate genes associated 
with phosphine resistance trait in T. castaneum by linkage 
mapping. Although we have only validated the technique 
for a strongly selected and previously characterized trait, 
theoretically it is transferrable to nearly any species that 
has some genomic or transcriptomic reference data 
against which reads may be aligned, whose breeding may 
be manipulated for the generation of specific populations 
for linkage mapping. The detailed genotype analysis using 
identified SNP markers demonstrated that high-level re- 
sistance in T. castaneum is a result of two synergistic 
genes, tc_rphl and tc_rph2 with the latter having signifi- 
cant fitness cost. The fine-scale mapping analysis also 



defined the genomic regions containing these loci. The 
results show the validity of the technique and that it can 
be applied in cases where parental genotypes are unknown 
and in circumstances where the genome coverage con- 
ferred by sequencing is relatively low, which would often 
be the case for organisms with large genomes. 

Methods 

Beetle strains 

Two strains of T. castaneum were used, a phosphine 
susceptible strain, QTC4 ('S' strain) and a strongly resis- 
tant strain, QTC931 (Strong-R). The 'S' strain was de- 
rived from adults collected from a storage facility in 
Brisbane, southeast Queensland, Australia in 1965 [31] 
and since then, the population of this strain has been 
maintained in the laboratory without exposure to phos- 
phine and is fully susceptible to phosphine. The Strong-R 
strain was derived from adults collected from central sto- 
rage at Natcha, southeast Queensland, and Australia in 
2000 [11]. Homozygosity for resistance was selected within 
the 'Strong-R' strain by phosphine selection for at least five 
generations. In addition, both S strain and Strong-R exhibit 
a linear response to phosphine in a probit mortality re- 
sponse analysis, indicating their respective homozygo- 
sity for the susceptibility/resistance trait [11]. The insects 
were cultured in whole wheat flour and yeast 20:1 and 
maintained at 30°C and 55% relative humidity (RH). 

Genetic crosses 

Two similar single pair inter-strain crosses (SIC), SICs/sra 
and SICs/srb (both S $ X Strong-R S) were established 
(Figure 1) between these strains as described previously 
[11]. For each cross, single virgin adult male and female 
siblings of the Fi progeny were mated to derive 194 F 2 
individuals, which were allowed to mate freely for sev- 
eral weeks to produce the F 3 generation. After produ- 
cing the F 3 generation, 94 F 2 individuals were retained 
and used for linkage mapping and SNP validation. The 
F 3 and subsequent progeny were kept as discrete gener- 
ational cohorts up to the F 2 s (Figure 1), and each further 
generational cohort consisted of approximately 500 to 
1000 beetles. The SICs/sra cross was established inde- 
pendently 20 months before the SICs/srb cross and 
aimed to establish genetic map for T. castaneum. The 
latter (SICs/srb) cross was established for the express 
purpose of genome resequencing to identify larger can- 
didate regions that could then be fine-scale mapped in 
the later generational cohorts (F 19 ) of the SICs/sra cross 
without having to wait another 16 months for the gener- 
ations to progress. As parental strains, S and Strong-R 
of these two crosses were highly homozygous in 
responding to the resistance trait, these two crosses 
were considered as a replicates. 
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Fumigation 

Phosphine gas generation and all fumigation bioassays 
were performed for 20 hours at 25°C in sealed airtight 
desiccators as described previously by Daglish et al. [32] 
and Jagadeesan et al. [11]. 

DNA extraction and sample preparation for sequencing 

A high discriminatory dose (0.8 mg I/ 1 ) selection was 
performed to isolate individuals that were homozygous 
for resistance trait by exposing more than 10,000 F 4 pro- 
geny of the cross SICs/srb and 20,000 F i9 progeny of the 
cross SIC S/SRA (Figure 1). For each cross, total genomic 
DNA was extracted from 100 survivors of each selection 
and 100 unselected beetles. The DNA was extracted 
using a QIAGEN DNeasy extraction kit according to the 
manufacturer's protocol. The F 4 samples were sequenced 
using the Illumina GAIIx platform with paired-end reads 
of 35 bp and the F 19 was subsequently sequenced with 
75 bp (paired end for selected, unpaired for unselected) 
(Figure 1). 

DNA extraction and sample preparation for SNP 
genotyping and mapping 

A modified high-salt DNA extraction procedure [33] 
was adopted for genotyping individual insects. Genomic 
DNA was extracted from the Pn parents, Fi hybrids, F 2 , 
F 4 , F 19 and F 28 individuals of the SIC S /sra intercross for 
fine-scale mapping of the resistance locus. Briefly, indi- 
vidual insects were placed in a 96-well PCR microtitre 
plate, snap frozen in liquid nitrogen and homogenized in 
100 uL lysis buffer (0.4 M NaCl, 10 mM Tris-HCl 
pH 8.0 and 2 mM EDTA pH 8.0) using a multi pin 
header, for 10-15 s. Then 4 \\L of 10% SDS (4% final 
concentration) and 1 uL of 10 mg ml" 1 proteinase K 
(100 \ig ml" 1 final concentration) were added to each 
sample and mixed well. The samples were incubated at 
37°C for overnight, after which 25 uL of 6 M NaCl 
(NaCl saturated H 2 0) was added to each. Samples were 
vortexed for 30 s at maximum speed, and tubes 
centrifuged for 45 min at 1500 g. The supernatant was 
transferred to fresh tubes. To each sample, an equal 
volume of ice cold 100% ethanol was added, mixed 
well, and then centrifuged for 45 min, at 1500 g. The 
pellet was washed with 70% ethanol, air dried and 
resuspended in 25 uL sterile distilled H 2 0. Finally, 
the samples were kept at 65°C for 10 min to denature 
any residual DNAses. The genomic DNA template 
was subsequently quantified and diluted to 2 ng ul/ 1 
before inclusion in PCR. 

Mapping short reads to the T. castaneum reference 
genome 

The selected and unselected re-sequenced data sets from 
F 4 and F 19 progeny of the crosses SIC s/S rb and SIC s/S ra 



(Figure 1) were mapped separately to the reference T. 
castaneum genome (version Tcas_3.0, from NCBI) using 
CLCBio Genomics Workbench v4.9 software [34]. The 
F 4 and F i9 sequence data sets were mapped to the refe- 
rence under the read mapping algorithm. Read mapping 
was performed with high stringency, requiring a perfect 
match and no unaligned ends with the following default 
parameters, Match +1 Mismatch - 2; Deletion - 3 and 
Insertion - 3. 

SNP detection 

Quality-based SNP detection in CLC Genomics Work- 
bench is based on the Neighbourhood Quality Standard 
(NQS) algorithm described by [35,36]. For a SNP to be 
called, the default quality specifications such as window 
length (11), maximum number of gaps and mismatches 
(2), minimum average quality of surrounding bases (15) 
and minimum quality of the central base (20) were 
chosen together with a minimum coverage of six and 
minimum variant frequency of 35% within a pool. 

The SNP detection output tables for selected and un- 
selected F 4 and Fi 9 datasets obtained were directly 
exported to a separate Excel™ spreadsheet. SNP datasets 
for selected (resistant) and unselected (segregating) pro- 
geny were compared by separating the data by gener- 
ation and chromosome, and sorted according to the 
reference base position and with the frequency of the 
primary variant, i.e. the most common allele for a SNP 
regardless of reference base identity, for the selected and 
unselected datasets being in separate columns. The fre- 
quency of the primary variant was then averaged over 
300 cells in the spreadsheet for each dataset for chromo- 
somes and 40 cells for unassigned scaffolds, given that 
the number of SNPs/scaffold is greatly reduced due to 
their smaller sizes. 

SNP validation and analysis 

A subset of SNPs were selected from identified candi- 
date gene regions for further investigation from regions 
where the averaged SNP allelic frequency attained near 
homozygosity (90-100%) in selected datasets but 
remained heterozygous (-50-80%) in the unselected 
datasets (Additional file 3: Table S2 and Figure 1). SNPs 
of interest that were identified to be on restriction en- 
zyme (RE) recognition sites, and thus amenable to CAPS 
analysis, were amplified from individual insects by PCR. 
For each insect, 4 ng of template DNA was amplified in 
a reaction volume of 25 \\L containing: lx PCR buffer, 
4 mM MgCl 2 , 0.8 mM dNTPs (New England Biolabs), 
1.25 U ul" 1 Taq DNA polymerase (Bioline), 0.4 uM spe- 
cific forward and reverse primers. The PCR conditions 
were: denaturation for 2 min at 95°C; then 40 cycles of 
95°C for 45 s, 52°C for 45 s, 72°C for 45 s with a final ex- 
tension at 72°C for 2 min. Restriction digestions were 
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carried out in 15 uL final volume containing 5 uL of the 
PCR product, lx concentrated restriction enzyme buffer 
with optional lx Bovine Serum Albumin (BSA), and 1-3 
U of RE for at least one hour to overnight at the 
approriate temperature. Products were analysed by 2.5% 
agarose gel electrophoresis (lx TAE, 200 V for 30 min). 

Mapping of candidate regions 

The CAPS markers from candidate regions in Chr8 
(Genbank accession CM000283), Chr9 (CM000284) and 
Uns-7 (DS497671) were genotyped in parents and Fi for 
polymorphism and tested for Mendalian segregation in the 
segregating F 2 . The F 2 genotypes were analysed by Map 
Manager QTXb20 and tested for 1:2:1 segregation of par- 
ental alleles using G test at 95% confidence interval 
[37,38]. Marker order was determined with a LOD score 
threshold of 3.0 and map distances were estimated by the 
Kosambi function [39]. 

The fine scale mapping was carried out on markers 
linked to resistance loci tc_rphl and tc_rph2 to estimate 
the distance between each marker and their respective 
target genes. This was achieved by genotyping the segre- 
gation profile of markers linked to tc_rphl and tc_rph2 
in survivors of F 4 (SIC s/S rb)> F 19 (SIC S / S ra) selected at 
high discriminating dose of 0.8 mg I/ 1 and F 2 8 (SICs/sra) 
progeny at 0.5 mg L" 1 , respectively (Figure 1). We reduced 
the discriminatory dose from 0.8 to 0.5 mg I/ 1 for the se- 
lection at F 2 8 progenies, owing to the relatively small size 
of population of 4700 beetles produced in this generation 
and moreover, this will not affect selecting strongly re- 
sistant homozygous insects [11]. Surviving individuals, 
not homozygous for the resistance alleles (tc_rphl and 
tc_rph2) were scored as recombinant at that locus. Indi- 
viduals who were heterozygous across the candidate re- 
gion were excluded from the analysis. 

Determination of relative contribution and fitness cost of 
tc_rph1 and tc_rph2 

To identify the relative contributions of resistance loci, 
tc_rphl and tc_rph2, a selection at series of concentrations 
of phosphine from 0.008, 0.01, 0.02, 0.03, 0.05, 0.06, 0.1, 0.2, 
0.5, 0.8, 1.0, 2.0, 3.0 and 4.0 mg L'\ each with approxi- 
mately 150-200 insects of F 4 progeny of the cross SICs/sra 
was performed (Figure 1) and the surviving individuals at 
each concentration were genotyped using the respectively 
linked SNP markers, tcc8-5.975 m and tccU7-138.2 k. The 
effect of resistance phenotypes for different allelic combina- 
tions was measured by comparing the genotype pattern of 
these markers over increasing concentrations of phosphine. 
The same markers were also used to investigate changes in 
allele frequency associated with resistant (rr) and sensitive 
(ss) alleles of both the loci tc_rphl and tc_rph2 in the 
absence of selection. The beede populations not exposed to 
phosphine (unselected) were sampled randomly (-96) at 



discrete generations, F 5 , Fi 0 , Fi 5 and F 20 and genotyped 
for both the resistance loci to determine any observable 
fitness/heritability effects associated with resistance loci 
in the absence of selection using a chi-square test. 

Additional files 



Additional file 1: Table SI. An example table for averaging variant 
(SNP) frequency for chromosome 8 for the F 4 generation Selected vs 
Unselected datasets in an Excel™ spreadsheet. 

Additional file 2: Figure SI. Illustrating the average SNP frequency 
across all chromosomes from F 4 and F 19 datasets (Selected vs Unselected) 

Additional file 3: Table S2. Listing markers used for mapping 
resistance loci on Chr8 and Chr9 (Uns-7). 
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